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ESTIMATION OF PROPORTIONS USING LINEAR MAPS 


1 . INTRODUCTION 

In this memorandum, a statistical population w Is considered which consists of 
a mixture of menders of two statistical populations and 0^2 In proportions 
and ct2‘ The problem of estimating a-j and on the basis of an unlabeled 
independent sample of observations on a> in n-dimensional Euclidean space 
is addressed. It is assumed that and ^2* respective expected values 
of observations on and in R*^, either are known or have been estimated 
with satisfactory accuracy from a labeled sample of observations on and 0)2* 

In the following sections it is first shown that, by using a certain derivation, 
one can obtain unbiased, consistent estimates^ of and 02 from almost any 
linear map from R*^ to R^. Then that linear map which yields estimates of 
and 0.2 having minimum variance among all estimates so obtained is sought. A 
simple expression for the minimum-variance estimates is obtained by following 
a line of reasoning analogous to that employed in the derivation of the Fisher 
linear discriminant (ref. 1 ). The exact evaluation of these estimates requires 
the use of the (usually unknown) covariance matrix of observations on u) in 
R*^. In practice, a satisfactory approximation of these estimates can be 
obtained by using the sample estimate of this covariance matrix instead. 

2 . ESTIMATION OF PROPORTIONS USING AN ALhJOST 
ARBITRARY LINEAR MAP 

Suppose that F is any linear map from r'’ to such that F(y.|) ^ F(p2)» and 
suppose that x = ^^k^k=l ••• N ® sample of independent observations on w 
in R*^. It is shown in the following that, from F, one can obtain unbiased, 
consistent estimates of and 02 based on x* Por convenience, write (uniquely) 
F(x) - b^x for appropriate beR*^, and denote by p the expected value of 
observations on w in R*^. 


Vor convenience, estimates and their associated estimators are identified. 
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k 




From the facts that \x * a-iy-j + 02^2 + 02 * 1 • one sees that 

F(y) = F(a^y^ + 02^^2^ “ ' ^2^ * ^ 2 ) 

= a^[F{y^) - F(y2)] ^(^3) 

Since F(y^) ^ F(y 2 )» it follows that 

F(y) - F(y2) 

“1 ' F(y^) - F(y 2 ) 

This suggests the estimates 

^ F(m) - F(y2> 

“1 ° F(w,) - F(m2) 

“2 - ' - “1 

where 

N 

k=l 

Since m is an unbiased and consistent estimate of y, it Is easily verified that 
the estimates given by eqs. (!) and (2) are unbiased and consistent. However, 
if approximate values are given for y-j and y 2 » these estimates will be biased 
accordingly. 

3. THE MIN I MUM- VARIANCE ESTIMATES 


( 1 ) 

( 2 ) 


One now determines which linear map F is "best" for use in the estimate given in 
eq. (1). That linear map for which the estimate given in eq. (1) has minimum 
variance among all such estimates is considered "best." 


The variance of the estimate of eq. (1) Is given by 
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Var(a^) = E(|a^ - |^) = E 
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where ^ = e((X - y)(X - y)^) is the covariance matrix of observations on 


U) 


in R*^. It follows that 


Var(a^) = ^ 


b^Eb 

b^{y.j - \x^ 


( 3 ) 
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It Is evident from eq. (3) that choosing b to minimize Var(a^) is equivalent 
to choosing b to maximize the expression 


b^{y^ - ^2) ^ b^E[z"^(y^ - 


T 
b'zb 


b^Eb 


Since the operator E"^(y^ - y2)(y^ - y£)^ is symmetric with respect to the 

T Y\ 

inner product <u, v> = u Ev on R , this is a (generalized) Rayleigh quotient. 

It is maximized when b is an eigenvector of Z“^(y^ - y2)(u^ - P2)^ corresponding 

to the eigenvalue of largest absolute value. Now E' (y-j - y2)(U'| - U2) is 
an operator of rank 1, and the only eigenvector having an associated eigenvalue 
which is nonzero is E"^(y^ - y2). (The eigenvalue associated with this eigen- 
vector is (y^ - y2)^Z*^(yi - y2)*) It follows that the variance of the 
estimate in eq. (1) is minimized when F is given by F(x) - b^x, where 
b = E"^(y^ - y2). 


With F chosen to minimize the variance of the estimate given by eq. (1), 
eqs. (1) and (2) can be written as 


/\ 



(y^ - y2)^Z‘^(m - y2) 
(Ul “ y2)'*'^’ - U2) 


(4) 


and 


tt2 = 1 - tti 


(5) 


The covariance matrix E is likely to be unknown in most applications; 
furthermore, since and a.2 are unknown, it cannot be determined from a 
knowledge of y^ y2» and the covariance matrices for observations in R*^ on 
0)^ and 0)2. However, when E is unknown, it can be replaced in eq. (4) by the 
sample estimate 

N 

^ ' rrrS <*k - 

k=l 
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to yield the estimates 


and 



(u^ - y2)^S"^(m - Mg) 
(m-| - M 2 ) S" (m-| - M 2 ) 
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